fbgemm |
您所在的位置:网站首页 › cuda gemm cannon › fbgemm |
FBGEMM_GPU
FBGEMM_GPU (FBGEMM GPU Kernels Library) is a collection of high-performance PyTorch GPU operator libraries for training and inference. The library provides efficient table batched embedding bag, data layout transformation, and quantization supports. FBGEMM_GPU is currently tested with CUDA 11.7.1 and 11.8 in CI, and with PyTorch packages that are built against those CUDA versions. Only Intel/AMD CPUs with AVX2 extensions are currently supported. Build InstructionsThis section is intended for FBGEMM_GPU developers. The full build instructions for the CUDA, ROCm, and CPU-only variants of FBGEMM_GPU can be found here. Installation Install through PIPCurrently only built with sm70/80 (V100/A100 GPU) wheel supports: # Release GPU conda install pytorch cuda -c pytorch -c "nvidia/label/cuda-11.7.1" pip install fbgemm-gpu # Release CPU-only conda install pytorch cuda -c pytorch -c "nvidia/label/cuda-11.7.1" pip install fbgemm-gpu-cpu # Nightly GPU conda install pytorch cuda -c pytorch-nightly -c "nvidia/label/cuda-11.7.1" pip install fbgemm-gpu-nightly # Nightly CPU-only pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html pip install fbgemm-gpu-nightly-cpu Running FBGEMM_GPUThe tests (in test folder) and benchmarks (in bench folder) are some great examples of using FBGEMM_GPU. To run the tests or benchmarks after building FBGEMM_GPU (if tests or benchmarks are built), use the following command: # run the tests and benchmarks of table batched embedding bag op, # data layout transform op, quantized ops, etc. cd test python split_table_batched_embeddings_test.py python quantize_ops_test.py python sparse_ops_test.py python split_embedding_inference_converter_test.py cd ../bench python split_table_batched_embeddings_benchmark.pyTo run the tests and benchmarks on a GPU-capable device in CPU-only mode use CUDA_VISIBLE_DEVICES=-1 CUDA_VISIBLE_DEVICES=-1 python split_table_batched_embeddings_test.py Run the tests on ROCmPlease add FBGEMM_TEST_WITH_ROCM=1 flag when running tests on ROCm. cd test FBGEMM_TEST_WITH_ROCM=1 python split_table_batched_embeddings_test.py Benchmark Example cd bench python split_table_batched_embeddings_benchmark.py uvm Documentation How FBGEMM_GPU worksFor a high-level overview, design philosophy and brief descriptions of various parts of FBGEMM_GPU please see our Wiki (work in progress). We have extensively used comments in our source files. The best and up-to-date documentation is available in the source files. Building the API DocumentationSee docs/README.md. Join the FBGEMM_GPU CommunityFor questions or feature requests, please file a ticket over on GitHub Issues or reach out to us on the #fbgemm channel in PyTorch Slack. For contributions, please see the CONTRIBUTING file for ways to help out. LicenseFBGEMM_GPU is BSD licensed, as found in the LICENSE file. |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |